In this document, we are going to performe an explorative data analysis on the Constitutional Referendum dataset.
The given dataset contains the referendum results (number of voters, vote distribution, etc.) stratified by municipality (i.e. Comune).
| 1 | 2 | 3 | 4 | 5 | 6 | |
|---|---|---|---|---|---|---|
| DESCREGIONE | ABRUZZO | ABRUZZO | ABRUZZO | ABRUZZO | ABRUZZO | ABRUZZO |
| DESCPROVINCIA | CHIETI | CHIETI | CHIETI | CHIETI | CHIETI | CHIETI |
| DESCCOMUNE | ALTINO | ARCHI | ARI | ARIELLI | ATESSA | BOMBA |
| ELETTORI | 2288 | 1785 | 831 | 939 | 8454 | 686 |
| ELETTORI_M | 1101 | 861 | 402 | 453 | 4121 | 344 |
| VOTANTI | 1496 | 1241 | 617 | 612 | 5860 | 467 |
| VOTANTI_M | 775 | 632 | 328 | 304 | 3006 | 239 |
| NUMVOTISI | 533 | 442 | 241 | 194 | 1952 | 168 |
| NUMVOTINO | 953 | 782 | 366 | 410 | 3836 | 297 |
| NUMVOTIBIANCHI | 2 | 3 | 6 | 1 | 45 | 2 |
| NUMVOTINONVALIDI | 8 | 14 | 4 | 7 | 27 | 0 |
| NUMVOTICONTESTATI | 0 | 0 | 0 | 0 | 0 | 0 |
The dataset has most of the basic data useful to performe an initial analysis on the referendum. Howerver, in order to gain some more insights and especially in order to be able to plot on a geographic map, we import an additional dataset (found here: http://ckan.ancitel.it/dataset/comuni-italiani-dati-territoriali-e-demografici ).
| 1 | 2 | 3 | 4 | 5 | 6 | |
|---|---|---|---|---|---|---|
| Comune | ABANO TERME | ABBADIA CERRETO | ABBADIA LARIANA | ABBADIA SAN SALVATORE | ABBASANTA | ABBATEGGIO |
| ISTAT | 28001 | 98001 | 97001 | 52001 | 95001 | 68001 |
| Provincia | PADOVA | LODI | LECCO | SIENA | ORISTANO | PESCARA |
| SiglaProv | PD | LO | LC | SI | OR | PE |
| Regione | VENETO | LOMBARDIA | LOMBARDIA | TOSCANA | SARDEGNA | ABRUZZO |
| AreaGeo | Nord-Est | Nord-Ovest | Nord-Ovest | Centro | Isole | Sud |
| PopResidente | 19950 | 289 | 3200 | 6444 | 2747 | 400 |
| PopStraniera | 2037 | 17 | 156 | 632 | 72 | 16 |
| DensitaDemografica | 932.64 | 47.91 | 193.55 | 110.16 | 70.54 | 26.36 |
| SuperficieKmq | 21.408 | 6.199 | 16.673 | 58.994 | 39.847 | 15.402 |
| AltezzaCentro | 14 | 64 | 204 | 822 | 315 | 450 |
| AltezzaMinima | 9 | 62 | 199 | 307 | 269 | 190 |
| AltezzaMassima | 80 | 70 | 1700 | 1738 | 483 | 1150 |
| ZonaAltimetrica | Montagna Interna | Pianura | Montagna Interna | Montagna Interna | Collina Interna | Collina Interna |
| TipoComune | No capoluogo | No capoluogo | No capoluogo | No capoluogo | No capoluogo | No capoluogo |
| GradoUrbaniz | Elevato | Medio | Medio | Basso | Basso | Basso |
| IndiceMontanita | Non montano | Non montano | Totalmente montano | Totalmente montano | Totalmente montano | Totalmente montano |
| ZonaClimatica | E | E | E | E | C | D |
| ZonaSismica | 4 | 4 | 4 | 2 | 4 | 1 |
| ClasseComune | Polo di attrazione intercomunale | Area di cintura | Area periferica | Area periferica | Area intermedia | Area intermedia |
| Latitudine | 45.35944 | 45.31222 | 45.89917 | 42.88000 | 40.12500 | 42.22361 |
| Longitudine | 11.789444 | 9.592778 | 9.333611 | 11.677500 | 8.820000 | 14.011389 |
The dataset we will make use of is obtained by merging these two tables. In order to make this join properly working, it has been necessary to do some pre-processing on the data, reconciling the names of some municipalities. Considered the pointwise nature of this task, the operation has been done by hand.
In this section we show some exploratory plots.
First, the distribution of the percentage of voters with respect to the total number of electors. As can be seen, there was generally an high rate of attendance.
For what concerns the attendance rate given sex, the next plot shows how generally males voted than females. We choose to plot also the mean value of the two population, even though they are highly skewed on the right.
We tested the difference between the two means using a t-test, which confirm the conjecture based on the histogram.
##
## Welch Two Sample t-test
##
## data: (dati$VOTANTI_M/dati$ELETTORI_M) and (dati$VOTANTI_F/dati$ELETTORI_F)
## t = 32.632, df = 15677, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.04383501 0.04943773
## sample estimates:
## mean of x mean of y
## 0.7081189 0.6614825
The following plot exibits the vote distrubution, stratified by geographic area. It is possible to see how generally the number of Pro is not above the 50% of the total. The two main exceptions are the central and the nord-east regions.
The number of not blank, contested or not valid votes is generally very low (as expected). If we consider for example the municipalities where the percentage of not valid votes is higher than 5% (that is, in the right tail of the distribution), we can observe they are not really outliers. Instead, this “high” value is probably due to the low number of possible voters.
| 4600 | 5465 | 5572 | |
|---|---|---|---|
| DESCREGIONE | PIEMONTE | PIEMONTE | PIEMONTE |
| DESCCOMUNE | BRIGA ALTA | SEROLE | VANZONE CON SAN CARLO |
| ELETTORI | 36 | 99 | 354 |
| VOTANTI | 21 | 56 | 217 |
| NUMVOTISI | 2 | 20 | 96 |
| NUMVOTINO | 17 | 33 | 103 |
| NUMVOTIBIANCHI | 0 | 0 | 0 |
| NUMVOTINONVALIDI | 2 | 3 | 18 |
| NUMVOTICONTESTATI | 0 | 0 | 0 |
| ClasseComune | Area ultra-periferica | Area periferica | Area intermedia |
| PERC_NONVALIDI | 0.09523810 | 0.05357143 | 0.08294931 |
The next geographic map highlight how the Pros and the Cons are distributed. The main (restricted) areas where the Pros win are in Toscana, Emilia Romagna and Trentino - Alto Adige.
More maps are available in the Tableau file.
| DESCREGIONE | ELETTORI | VOTANTI | NUMVOTISI | PERC_VOTANTI | PERC_SI |
|---|---|---|---|---|---|
| ABRUZZO | 1052049 | 722930 | 255001 | 0.6871638 | 0.3527326 |
| BASILICATA | 467000 | 293546 | 98924 | 0.6285782 | 0.3369966 |
| CALABRIA | 1549305 | 842992 | 275449 | 0.5441098 | 0.3267516 |
| CAMPANIA | 4566905 | 2689070 | 839692 | 0.5888167 | 0.3122611 |
| EMILIA-ROMAGNA | 3326910 | 2526230 | 1262484 | 0.7593322 | 0.4997502 |
| FRIULI-VENEZIA GIULIA | 952494 | 690717 | 267357 | 0.7251668 | 0.3870717 |
| LAZIO | 4402145 | 3044673 | 1108768 | 0.6916340 | 0.3641665 |
| LIGURIA | 1241469 | 865756 | 342671 | 0.6973642 | 0.3958055 |
| LOMBARDIA | 7480375 | 5552510 | 2452936 | 0.7422770 | 0.4417707 |
| MARCHE | 1189181 | 866233 | 385768 | 0.7284282 | 0.4453398 |
| MOLISE | 256600 | 164038 | 63695 | 0.6392751 | 0.3882942 |
| PIEMONTE | 3396378 | 2446664 | 1054749 | 0.7203745 | 0.4310968 |
| PUGLIA | 3280712 | 2024651 | 659354 | 0.6171377 | 0.3256630 |
| SARDEGNA | 1375735 | 859158 | 237280 | 0.6245084 | 0.2761774 |
| SICILIA | 4013248 | 2271850 | 639629 | 0.5660876 | 0.2815454 |
| TOSCANA | 2854129 | 2125053 | 1105769 | 0.7445539 | 0.5203489 |
| TRENTINO-ALTO ADIGE | 792504 | 572486 | 305322 | 0.7223762 | 0.5333266 |
| UMBRIA | 675610 | 496406 | 240346 | 0.7347523 | 0.4841722 |
| VALLE D’AOSTA | 99735 | 71717 | 30568 | 0.7190756 | 0.4262309 |
| VENETO | 3720717 | 2852591 | 1077247 | 0.7666778 | 0.3776381 |
The next two plots display the national distribution of the votes. More than 13 million voters (one over three) didn’t expressed an opinion.